Array-based Spectro-temporal Masking for Automatic Speech Recognition Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering
نویسندگان
چکیده
Over the years, a variety of array processing techniques have been applied to the problem of enhancing degraded speech to improve automatic speech recognition. In this context, linear beamforming has long been the approach of choice, for reasons including good performance, robustness and analytical simplicity. While various nonlinear techniques – typically based to some extent on the study of auditory scene analysis – have also been of interest, they tend to lag behind their linear counterparts in terms of simplicity, scalability and flexibility. Nonlinear techniques are also more difficult to analyze and lack the systematic descriptions available in the study of linear beamformers. This work focuses on a class of nonlinear processing, known as time-frequency (TF) masking – a.k.a. spectro-temporal masking – whose variants comprise a significant portion of the existing techniques. T-F masking is based on accepting or rejecting individual time-frequency cells based on some estimate of local signal quality. Analyses are developed that attempt to mirror the beam patterns used to describe linear processing, leading to a view of T-F masking as “nonlinear beamforming”. Two distinct formulations of these “nonlinear beam patterns” are developed, based on different metrics of the algorithms behavior; these formulations are modeled in a variety of scenarios to demonstrate the flexibility of the idea. While these patterns are not quite as simple or all-encompassing as traditional beam patterns in microphone-array processing, they do accurately represent the behavior of masking algorithms in analogous and intuitive ways. In addition to analyzing this class of nonlinear masking algorithm, we also attempt to improve its performance in a variety of ways. Improvements are proposed to the baseline two-channel version of masking, by addressing both the mask estimation and the signal reconstruction stages; the latter more successfully than the former. Furthermore, while these approaches have been shown to outperform linear beamforming in two-sensor arrays, extensions to larger arrays have been few and unsuccessful. We find that combining beamforming and masking is a viable method of bringing the benefits
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملاثربخشی آموزش ابراز وجود فرهنگمحور بر عزتنفس فرزندان طلاق
Brever, M.M.( 2010).The effects of child gender and child age at the time of parental divorce on the development. COLLEGE OF SOCIAL AND BEHAVIORAL SCIENCES, Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Psychology Educational Track.
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملSubmitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering
متن کامل
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering
............................................................................................................................................ 3 Acknowledgements ............................................................................................................................. 5 Table of
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014